660 research outputs found
Evaluation of Automatic Video Captioning Using Direct Assessment
We present Direct Assessment, a method for manually assessing the quality of
automatically-generated captions for video. Evaluating the accuracy of video
captions is particularly difficult because for any given video clip there is no
definitive ground truth or correct answer against which to measure. Automatic
metrics for comparing automatic video captions against a manual caption such as
BLEU and METEOR, drawn from techniques used in evaluating machine translation,
were used in the TRECVid video captioning task in 2016 but these are shown to
have weaknesses. The work presented here brings human assessment into the
evaluation by crowdsourcing how well a caption describes a video. We
automatically degrade the quality of some sample captions which are assessed
manually and from this we are able to rate the quality of the human assessors,
a factor we take into account in the evaluation. Using data from the TRECVid
video-to-text task in 2016, we show how our direct assessment method is
replicable and robust and should scale to where there many caption-generation
techniques to be evaluated.Comment: 26 pages, 8 figure
Automatic skin segmentation for gesture recognition combining region and support vector machine active learning
Skin segmentation is the cornerstone of many applications such as gesture recognition, face detection, and objectionable image filtering. In this paper, we attempt to address the skin segmentation problem for gesture recognition. Initially, given a gesture video sequence, a generic skin model is applied to the first couple of frames to automatically collect the training data. Then, an SVM classifier based on active learning is used to identify the skin pixels. Finally, the results are improved by incorporating region segmentation. The proposed algorithm is fully automatic and adaptive to different signers. We have tested our approach on the ECHO database. Comparing with other existing algorithms, our method could achieve better performance
A framework for sign language recognition using support vector machines and active learning for skin segmentation and boosted temporal sub-units
This dissertation describes new techniques that can be used in a sign language recognition (SLR) system, and more generally in human gesture systems. Any SLR system consists of three main components: Skin detector, Tracker, and Recognizer. The skin detector is responsible for segmenting skin objects like the face and hands from video frames. The tracker keeps track of the hand location (more specifically the bounding box) and detects any occlusions that might happen between any skin objects. Finally, the recognizer tries to classify the performed sign into one of the sign classes in our vocabulary using the set of features and information provided by the tracker.
In this work, we propose a new technique for skin segmentation using SVM (support vector machine) active learning combined with region segmentation information. Having segmented the face and hands, we need to track them across the frames. So, we have developed a unified framework for segmenting and tracking skin objects and detecting occlusions, where both components of segmentation and tracking help each other. Good tracking helps to reduce the search space for skin objects, and accurate segmentation increases the overall tracker accuracy.
Instead of dealing with the whole sign for recognition, the sign can be broken down into elementary subunits, which are far less in number than the total number of signs in the vocabulary. This motivated us to propose a novel algorithm to model and segment these subunits, then try to learn the informative combinations of subunits/features using a boosting framework. Our results reached above 90% recognition rate using very few training samples
Creating a web-scale video collection for research
This paper begins by considering a number of important design questions for a
web-scale, widely available, multimedia test collection intended to support
long-term scientific evaluation and comparison of content-based video analysis and
exploitation systems. Such exploitation systems would include the kinds of functionality
already explored within the annual TRECVid benchmarking activity such as search, semantic
concept detection, and automatic summarisation.
We then report on our progress in creating
such a multimedia collection which we believe to be web scale and which will support a next generation of benchmarking activities for content-based video operations, and we report on our plans for how we intend to put this collection, the IACC.1 collection, to use
TRECVID 2008 - goals, tasks, data, evaluation mechanisms and metrics
The TREC Video Retrieval Evaluation (TRECVID) 2008 is a TREC-style video analysis and retrieval evaluation, the goal of which remains to promote progress in content-based exploitation of digital video via open, metrics-based evaluation. Over the last 7 years this effort has yielded a
better understanding of how systems can effectively accomplish such processing and how one can reliably benchmark their performance. In 2008, 77 teams (see Table 1) from various research organizations --- 24 from
Asia, 39 from Europe, 13 from North America, and 1 from Australia --- participated in one or more of five tasks: high-level feature extraction, search (fully automatic, manually assisted, or interactive), pre-production video (rushes) summarization, copy detection, or surveillance event detection. The copy detection and surveillance event detection tasks are being run for the first time in TRECVID.
This paper presents an overview of TRECVid in 2008
Multimodal Classification of Urban Micro-Events
In this paper we seek methods to effectively detect urban micro-events. Urban
micro-events are events which occur in cities, have limited geographical
coverage and typically affect only a small group of citizens. Because of their
scale these are difficult to identify in most data sources. However, by using
citizen sensing to gather data, detecting them becomes feasible. The data
gathered by citizen sensing is often multimodal and, as a consequence, the
information required to detect urban micro-events is distributed over multiple
modalities. This makes it essential to have a classifier capable of combining
them. In this paper we explore several methods of creating such a classifier,
including early, late, hybrid fusion and representation learning using
multimodal graphs. We evaluate performance on a real world dataset obtained
from a live citizen reporting system. We show that a multimodal approach yields
higher performance than unimodal alternatives. Furthermore, we demonstrate that
our hybrid combination of early and late fusion with multimodal embeddings
performs best in classification of urban micro-events
An Investigation into the Labor Market Behavior and Characteristics of Emirati Unemployed
The strong and robust growth of the United Arab Emirates (UAE) over the past decade has significantly raised the standards of living in the country, and has created remarkable economic and social transformations. However, there is some concern that strong output growth has yet to translate into an equivalent growth of jobs for UAE citizens, particularly outside the public sector and among young nationals. A careful estimate shows that the number of unemployed Emiratis by the end of 2011 is 34750, of which 72 percent are women, and 65 percent are youth. Among the youth, the percentage of unemployed females is 70 percent. In 2010 the Emirati unemployment rate was estimated at 14 percent; 8 percent among males and 28 percent among females. In 2011, the unemployment rate is estimated at 12.8%; the highest unemployment rate is in Al Fujairah (19.5%) followed by Abu Dhabi (15.1%) and the lowest rate estimated in Dubai at 7%
HLVU : A New Challenge to Test Deep Understanding of Movies the Way Humans do
In this paper we propose a new evaluation challenge and direction in the area
of High-level Video Understanding. The challenge we are proposing is designed
to test automatic video analysis and understanding, and how accurately systems
can comprehend a movie in terms of actors, entities, events and their
relationship to each other. A pilot High-Level Video Understanding (HLVU)
dataset of open source movies were collected for human assessors to build a
knowledge graph representing each of them. A set of queries will be derived
from the knowledge graph to test systems on retrieving relationships among
actors, as well as reasoning and retrieving non-visual concepts. The objective
is to benchmark if a computer system can "understand" non-explicit but obvious
relationships the same way humans do when they watch the same movies. This is
long-standing problem that is being addressed in the text domain and this
project moves similar research to the video domain. Work of this nature is
foundational to future video analytics and video understanding technologies.
This work can be of interest to streaming services and broadcasters hoping to
provide more intuitive ways for their customers to interact with and consume
video content
The Impact of the Labor policy on demographics
As a country that relies heavily on imported workers, the impact of the United Arab Emirates\u27 (UAE) labor policies on demographics cannot be overstated. The number and the types of workers admitted into the UAE every year, and the duration of their stay, directly affect the demographic profile of the nation\u27s population in terms of size, growth, age, gender, race, health, nationality, as well as socioeconomic status like education and income. Policies that continue to encourage the importing of young, uneducated and low-paid workforce from abroad would only exacerbate the existing gender and ethnic imbalance in the population; as such workers tend to be male, single, and coming from a few south Asian countries. By contrast, labor policies that encourage the use of more skilled knowledge workers are more likely to bring in people from more diverse ethnic backgrounds and with more balanced distribution across gender and age. Labor policies also affect demographics through their impact on marital and family relationships, as higher-paid workers are more likely to bring their families to the UAE or start one in the country than low-paid laborers are. The impact of labor policies on demographics of local population is significant too, most likely through their impact on female employment and costs of living, which subsequently affect local people\u27s marriage patterns and fertility rates
- …